A computational auditory scene analysis system for robust speech recognition

نویسندگان

  • Soundararajan Srinivasan
  • Yang Shao
  • Zhaozhang Jin
  • DeLiang Wang
چکیده

We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, the ideal binary time-frequency (T-F) mask which retains the mixture in a local TF unit if and only if the target is stronger than the interference within the unit. In the first stage, we use harmonicity to segregate the voiced portions of individual sources in each time frame based on multipitch tracking. Additionally, unvoiced portions are segmented based on an onset/offset analysis. In the second stage, speaker characteristics are used to group the T-F units across time frames. The resulting T-F masks are used in conjunction with missing-data methods for recognition. Systematic evaluations on a speech separation challenge task show significant improvement over the baseline performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A computational auditory scene analysis system for speech segregation and robust speech recognition

A conventional automatic speech recognizer does not perform well in the presence of multiple sound sources, while human listeners are able to segregate and recognize a signal of interest through auditory scene analysis. We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, ...

متن کامل

CASA based speech separation for robust speech recognition

This paper introduces a speech separation system as a front-end processing step for automatic speech recognition (ASR). It employs computational auditory scene analysis (CASA) to separate the target speech from the interference speech. Specifically, the mixed speech is preprocessed based on auditory peripheral model. Then a pitch tracking is conducted and the dominant pitch is used as a main cu...

متن کامل

Martin Cooke , Phil Green and Malcolm Crawford HANDLING MISSING DATA IN SPEECH RECOGNITION

In this paper, we propose a new paradigm for robust ASR based on auditory scene analysis. In previous work, we have shown how models of auditory processing and grouping principles can be used to separate the evidence for a speech signal from arbitrary intrusions. However, this evidence will generally be incomplete since some spectrotemporal regions will be dominated by the other sources. Here, ...

متن کامل

Monaural speech separation based on MAXVQ and CASA for robust speech recognition

Robustness is one of the most important topics for automatic speech recognition (ASR) in practical applications. Monaural speech separation based on computational auditory scene analysis (CASA) offers a solution to this problem. In this paper, a novel system is presented to separate the monaural speech of two talkers. Gaussian mixture models (GMMs) and vector quantizers (VQs) are used to learn ...

متن کامل

Connectionist Models for Auditory Scene Analysis

Although the visual and auditory systems share the same basic tasks of informing an organism about its environment, most connectionist work on hearing to date has been devoted to the very different problem of speech recognition . VVe believe that the most fundamental task of the auditory system is the analysis of acoustic signals into components corresponding to individual sound sources, which ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006